Large Scale Reinforcement Learning using Q-SARSA() and Cascading Neural Networks

نویسنده

  • Steffen Nissen
چکیده

This thesis explores how the novel model-free reinforcement learning algorithm Q-SARSA(λ) can be combined with the constructive neural network training algorithm Cascade 2, and how this combination can scale to the large problem of backgammon. In order for reinforcement learning to scale to larger problem sizes, it needs to be combined with a function approximator such as an artificial neural network. Reinforcement learning has traditionally been combined with simple incremental neural network training algorithms, but more advanced training algorithms like Cascade 2 exists that have the potential of achieving much higher performance. All of these advanced training algorithms are, however, batch algorithms and since reinforcement learning is incremental this poses a challenge. As of now the potential of the advanced algorithms have not been fully exploited and the few combinational methods that have been tested have failed to produce a solution that can scale to larger problems. The standard reinforcement learning algorithms used in combination with neural networks are Q(λ) and SARSA(λ), which for this thesis have been combined to form the Q-SARSA(λ) algorithm. This algorithm has been combined with the Cascade 2 neural network training algorithm, which is especially interesting because it is a constructive algorithm that can grow a neural network by gradually adding neurons. For combining Cascade 2 and Q-SARSA(λ) two new methods have been developed: The NFQ-SARSA(λ) algorithm, which is an enhanced version of Neural Fitted Q Iteration and the novel sliding window cache. The sliding window cache and Cascade 2 are tested on the medium sized mountain car and cart pole problems and the large backgammon problem. The results from the test show that Q-SARSA(λ) performs better than Q(λ) and SARSA(λ) and that the sliding window cache in combination with Cascade 2 and Q-SARSA(λ) performs significantly better than incrementally trained reinforcement learning. For the cart pole problem the algorithm performs especially well and learns a policy that can balance the pole for the complete 300 steps after only 300 episodes of learning, and its resulting neural network contains only one hidden neuron. This should be compared to 262 steps for the incremental algorithm after 10,000 episodes of learning. The sliding window cache scales well to the large backgammon problem and wins 78% of the games against a heuristic player, while incremental training only wins 73% of the games. The NFQ-SARSA(λ) algorithm also outperforms the incremental algorithm for the medium sized problems, but it is not able to scale to backgammon. The sliding window cache in combination with Cascade 2 and Q-SARSA(λ) performs better than incrementally trained reinforcement learning for both medium sized and large problems and it is the first combination of advanced neural network training algorithms and reinforcement learning that can scale to larger problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Scale Reinforcement Learning using Q-SARSA(λ) and Cascading Neural Networks M.Sc. Thesis

This thesis explores how the novel model-free reinforcement learning algorithm Q-SARSA(λ) can be combined with the constructive neural network training algorithm Cascade 2, and how this combination can scale to the large problem of backgammon. In order for reinforcement learning to scale to larger problem sizes, it needs to be combined with a function approximator such as an artificial neural n...

متن کامل

Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models

Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cros...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

A Comparison of Neural Network Architectures in Reinforcement Learning in the Game of Othello

Declaration This thesis contains no material which has been accepted for the award of any other degree or diploma in any tertiary institution, and to the best of my knowledge and belief, contains no material previously published or written by another person, except where due reference is made in the text of the thesis. Abstract Over the past two decades, Reinforcement Learning has emerged as a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007